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Abstract 

Live Tor network experiments are difficult due to Tor’s 
distributed nature and the privacy requirements of its 
client base. Alternative experimentation approaches, 
such as simulation and emulation, must make choices 
about how to model various aspects of the Internet and 
Tor that are not possible or not desirable to duplicate or 
implement directly. This paper methodically models the 
Tor network by exploring and justifying every modeling 
choice required to produce accurate Tor experimentation 
environments. We validate our model using two state- 
of-the-art Tor experimentation tools and measurements 
from the live Tor network. We find that our model en- 
ables experiments that characterize Tor’s load and per- 
formance with reasonable accuracy. 

1 Introduction 

Tor [20] is an anonymizing overlay network consisting 
of thousands of volunteer relays that provide forward- 
ing services used by hundreds of thousands of clients. 
To protect their identity, clients encrypt their messages 
multiple times before source-routing them through a cir- 
cuit of multiple relays. Each relay decrypts one layer 
of each message before forwarding it to the next-hop re- 
lay or destination server specified by the client. Without 
traffic analysis, the client and server are unlinkable : no 
single node on the communication path can link the mes- 
sages sent by the client to those received by the server. 

Tor is a distributed system containing a handful of au- 
thorities that assist in distributing a consensus of trusted 
relay information. This directory of relays informs 
clients about the stability of and resources provided by 
each relay. Clients use this information to select relays 
for their circuits: the choice is weighted by the rela- 
tive difference in the perceived throughput of each relay 
in an attempt to balance network load. Although Tor’s 
main purpose is to protect a client’s communication pri- 
vacy, it also serves as a tool to resist censorship. Cit- 
izens in countries controlled by repressive regimes rely 
on Tor to mask their intended communication partners. 
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thereby circumventing the block that may otherwise oc- 
cur at censors’ borders. Although several nations have 
attempted to block Tor, its distributed architecture has 
thus far proven resilient to long term censorship. 

Tor’s popularity, distributed architecture, and privacy 
requirements increase the difficulty in experimenting 
with new algorithm and protocol designs. New designs 
require software updates before testing their network ef- 
fects, which both prolongs and complicates the experi- 
mental process. Further, since the live network is not 
a controlled environment, fluctuations in network condi- 
tions may both bias results and make them impossible 
to replicate. Finally, experiments that require client data 
collection are generally discouraged due to privacy risks. 

The disadvantages to live Tor experimentation have 
led researchers to explore alternative approaches, in- 
cluding utilization of network testbeds such as Planet- 
Lab [29], simulation [23,24,28], and emulation [4,5, 16], 
Each of these live Tor experimentation alternatives must 
make choices about how to model the existing network. 
A lack of details about and justifications for such choices 
obscures the level of faithfulness to the live network and 
decreases confidence that the obtained results provide 
meaningful information. 

We improve the state of cyber security by contributing 
a novel and complete model of the Tor network that may 
be used for safe and realistic Tor experiments. In Sec- 
tion 3, we enumerate, explore, and justify each Tor mod- 
eling decision through methodical reasoning, using data 
from real Internet measurements where possible. We 
provide insight into non-intuitive consequences of alter- 
native modeling strategies while precisely specifying and 
discussing our modeling techniques. 

We validate that our model produces an accurate envi- 
ronment whose performance and load are characteristic 
of the live Tor network. To this end, we utilize two state- 
of-the-art Tor experimentation platforms: Shadow [23] 
and ExperimenTor [16], We describe the tools and dis- 
cuss their pros and cons in Section 2, both to show that 
our model is applicable in multiple testing environments 


and to guide future work in selecting the tool most suit- 
able to a given research question. In Section 4, we in- 
stantiate our model with both Shadow and Experimen- 
Tor and compare results obtained with each tool to data 
collected from the live Tor network. We find that both 
tools produce reasonable Tor load and performance char- 
acteristics using networks of various sizes produced with 
our model. We inform the research community about the 
lessons we learned in Section 5 while concluding and 
discussing future work in Section 6. 

The following summarizes this paper’s contributions: 

• Justified, precise specifications of techniques used 
to create accurate Tor network models 

• Validation that our model produces an accurate en- 
vironment whose performance and load are charac- 
teristic of the live Tor network, with multiple exper- 
imentation tools 

• The first direct comparison between results obtained 
with Shadow [23] and ExperimenTor [16] — two 
state-of-the-art Tor experimentation platforms 

2 Background and Related Work 

While Tor is the most widely used anonymity network to- 
day with hundreds of thousands of daily users. Tor is still 
an active research network on which researchers work to 
improve its performance and security. To that end, prior 
Tor research has utilized a wide variety of methodologies 
which includes: analytical modeling [18,28] and simula- 
tion [24,28] of specific aspects of Tor’s design; relatively 
small Tor deployments on PlanetLab [15, 33]; and direct 
experimentation [17] and measurement [27] on the live 
Tor network. 

Analytical modeling, simulations and small-scale Tor 
deployments on testbeds such as PlanetLab each make 
certain simplifying and potentially unrealistic assump- 
tions that often leave many open questions about how 
the results obtained might translate to the live Tor net- 
work. Direct measurement and experimentation with the 
live network are unable to investigate design changes at 
scale due to software upgrade delays. Further, such well- 
intentioned research might have a negative impact on real 
Tor users’ quality of service or privacy [25], 1 

In an effort to enhance the realism and safety of 
Tor experimentation, two designs for whole-Tor network 
testbeds. Shadow [23] and ExperimenTor [16], have been 
independently developed and made publicly available for 
use by the research community. In contrast to prior ap- 
proaches to Tor research, these testbeds seek to replicate 
in isolation the important dynamics of the live Tor net- 
work at or near scale, complete with directory authori- 
ties, Tor routers. Tor clients, applications, and servers. 
While the details of how these tools model the live Tor 

1 See Bauer el al. [16] for a survey of prior methods for Tor research. 


network are discussed at length in Section 3, we first 
overview each tool’s distinct approach. 2 
Shadow. To produce high fidelity experiments in a 
controlled and repeatable manner. Shadow leverages 
discrete-event simulation of the network layer and runs 
real, unmodified application software within the vir- 
tual network topology. Shadow also simulates the ef- 
fects of background Internet traffic by introducing non- 
deterministic jitter and packet loss on links. Shadow 
offers an extensible plug-in framework through which 
an investigator can integrate an application or protocol 
of her choice into the Shadow experimentation environ- 
ment. A plug-in called Scallion for simulating the Tor 
network is available. Important advantages of Shadow 
are that it can simulate large-scale distributed systems 
(such as Tor) on a single well-provisioned machine, re- 
sults can be trivially replicated due to its design, and it 
can scale to arbitrarily sized networks because it runs in 
virtual time. Furthermore, virtual machines are available 
for running Shadow in the cloud on Amazon’s EC2. See 
Shadow’s webpage for more details [11], 
ExperimenTor. Similar to Shadow, ExperimenTor of- 
fers the ability to run unmodified Tor software within 
an isolated environment to conduct experiments that are 
faithful to dynamics of the live Tor network. In contrast 
to Shadow’s network simulation approach, Experimen- 
Tor is a network emulation-based testbed, built on the 
mature Modelnet [34] network emulation platform. Ex- 
perimenTor uses one machine to emulate a specified net- 
work topology and another machine (or possibly several 
machines) to run unmodified software within the virtual 
network. Also unlike Shadow, ExperimenTor does not 
endeavor to account for the effects of unrelated back- 
ground Internet traffic on experiments. While Experi- 
menTor cannot easily be run on a single machine, it has 
an advantage of using the operating system’s native net- 
work stack, rather than a simulation. More details about 
ExperimenTor can be found on its webpage [6], 

3 Model 

Tor experimentation outside of the live network bene- 
fits from accurate models of network characteristics and 
node behaviors. This section details our approach to 
modeling Tor while discussing alternative approaches 
and common pitfalls. Our model is not intended to 
be a complete set of all characteristics and behaviors 
one could model, but rather the subset that we found 
most important and most useful. Although tested with 
Shadow [23] and ExperimenTor [16], we intend the 
model to apply to a broad range of research problems. 


-This work describes and uses Shadow version 1.4.0 with Scallion 
version 1.3.1, and ExperimenTor as of April 2012. Later versions may 
have new features and capabilities not described here. 



Figure 1: The vertex and edge properties in our modeled 
topology. The topology forms a complete graph. 

3.1 Topology 

We first consider the structure of our experimental net- 
work. Ideally, our network topology would replicate the 
Internet architecture, including all autonomous systems 
(ASes), core, backbone, and edge routers, and all links 
between them. Such a structure would provide the most 
accurate view of the Internet to an experimental frame- 
work. Unfortunately, the exact structure of the Internet 
is unknown and inferring it is an open research problem 
(e.g., [32]). Even if the Internet structure were known, 
it would be extremely large and too inefficient to repli- 
cate for experimental purposes. Therefore, we produce a 
small-scale, manageable model of the Internet. 

Mapping the Internet topology is a major research area 
that has resulted in the development of multiple tools 
and techniques [3, 19,26, 35]. This work utilizes geo- 
graphic clustering by country 3 to scale the Internet down 
to a manageable topology because Tor similarly reports 
statistics about its users, allowing for a natural assign- 
ment of Tor node properties and placement of Tor nodes 
in our topology. Further, our approach produces small 
and efficient complete topologies (see Figure 1): we min- 
imize the number of topology vertices and edges while 
remaining compatible with Tor’s reporting method, and 
do not require routing algorithms to send packets through 
the network backbone. Finally, geographical clustering 
simplifies the process of mapping nodes to vertices, since 
any desired location (IP address) can be mapped to a 
cluster using a wide variety of GeoIP tools (e.g. those 
provided by MaxMind [8]). 

Network Vertices. In our clustering approach, we create 
a network vertex for each country, Canadian province, 
and American state. 4 We take this approach, as op- 
posed to clustering by Autonomous System (AS), be- 
cause geographical clustering most closely resembles the 
actual structure of the Internet: end-users and hardware 
are physically located in clearly defined geographical re- 


3 Note that some Tor research questions may require a more detailed 
model of the Internet topology, a problem future work should consider. 

4 We used Tor’s directly-connecting-user country database [12] to 
form our list of countries, which we supplemented with states and 
provinces from Net Index [9], 


gions. ASes, however, typically span multiple geograph- 
ical regions. Further, many properties of network ver- 
tices and edges directly correspond to their geographi- 
cal location, resulting in less variance when aggregating 
measurements of such properties. 

Each vertex is assigned default upstream bandwidth, 
downstream bandwidth, and packet loss properties ob- 
tained from the Ookla Net Index dataset [9]. The dataset 
provides aggregate statistics collected during bandwidth 
speed tests [2] and ping tests [10], Ookla aggregates mil- 
lions of such tests and provides the rolling mean through- 
put for each geographic region (vertex in our topology) 
over thirty day intervals. The cumulative distributions on 
bandwidths are shown in Figure 2a. 

Network Edges. Each vertex in our topology is con- 
nected to every other vertex, forming a complete graph. 
Each of these pairwise connections are represented as a 
network edge. We assign each network edge the follow- 
ing properties: latency (end-to-end packet delay), jitter 
(the variation in packet delay), and packet loss (the frac- 
tion of packets that are dropped). Note that full end-to- 
end loss rates are computed by combining the loss rates 
of the source and destination vertices and the connecting 
edge. Due to the lack of accurate loss rate measurements 
in the Internet core, our model currently utilizes only ver- 
tex loss rates from Ookla [9], 

To model edge latency in our topology, we use round 
trip times (RTTs) measured by the iPlane [26] latency 
estimation service. 5 iPlane gathers RTTs from several 
vantage points, including PlanetFab nodes and traceroute 
servers, on a daily basis [7]. We use to approximate 
latency between every iPlane node. 6 We then use GeoIP 
lookup to assign each iPlane node to a network vertex, 
and therefore each estimated latency value corresponds 
to a network edge. Since there may not be an iPlane node 
corresponding to every network vertex (because there is 
not an iPlane node in every country), we create a tem- 
porary virtual overlay topology containing only nine “re- 
gional” clusters (e.g. US East, US West, EU East, EU 
West, etc.) and aggregate our latency estimates on the 
corresponding regional overlay edges. Then, we assign 
each network edge for which we have no RTT measure- 
ments the median latency value from the corresponding 
overlay edge. Figure 2b shows the iPlane latency esti- 
mates between common regional overlay edges, and con- 
firms that an increase in physical distance between nodes 
implies an increase in latency. Finally, we approximate 
jitter over our network edges as where IQR is the 
edge latency inter-quartile range. 


5 The traceroutes were collected on 2012-03-28. 

6 Although Internet paths may be asymmetric, we found a suit- 
able approximation of edge latency after aggregating measurements. 
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Figure 2: (a) Topology vertex bandwidths from Net Index, and estimated relay bandwidths from published relay documents, (b) 
Topology edge latency, (c) Sampling relays for scaled-down Tor experiments. Our sampling algorithm produces the best fit to the 
original relay bandwidth distribution by minimizing the area between the CDF curves. 


3.2 Hosts 

Once we have configured a topology, we next configure 
hosts that operate in that topology. In the context of a 
Tor network, we are most concerned with Tor relays. Tor 
clients. Tor authorities, and Internet web/file servers. Al- 
though the live Tor network contains thousands of relays 
and hundreds of thousands of clients, it is often the case 
that experiments must be scaled down significantly due 
to hardware limitations. We now explain our approach to 
scaling down the Tor network for each host type. 

Tor Relays. Relays are an important part of a Tor net- 
work model, as each of them donates bandwidth and pro- 
vides the forwarding service upon which the network 
is built. Recall that when building circuits, clients se- 
lect relays according to weights published in the consen- 
sus. These selection-weights direct clients to relays ac- 
cording to each relay’s perceived throughput, and have a 
dramatic impact on relay and network load and conges- 
tion [31]. Therefore when scaling down from the thou- 
sands of relays in the Tor consensus to a manageable 
number for experiments, it is important that the distri- 
bution of selection-weights in the scaled network is as 
close as possible to that of the live network. Past work 
has sampled uniformly at random from the existing set of 
relays [24] when choosing relays for experiments. Un- 
fortunately, the distribution that results from randomly 
selecting relays may not fit the original selection-weight 
distribution well. We now describe an algorithm that pro- 
duces the best fit sample of the original distribution while 
quantifying its improvement over random selection. 

To scale the number of relays down to tC of A f, we split 
a sorted list of IC relay selection-weights into JC bins and 
choose the median weight from each bin. The result- 
ing weight distribution best fits that of the original relay 
population: any non-median weight value would only in- 
crease the distance between the distributions. This ap- 
proach is detailed in Algorithm 1 . To quantify our algo- 
rithm’s effectiveness, we compare it to random selection 


Algorithm 1: Sample relay bandwidths to produce a dis- 
tribution that best fits that of the original relay population 
Input: sorted list C of J\f relay bandwidths, sample 
size K. < AT 

Output: sorted list of sampled bandwidths S 

1 n <— floor 

2 r 4— K — n; 

3 i i — 0; 

4 for k 4- 0 to K. — 1 do 

5 j <— i + ir, 

6 if k <r then j <— j + 1 ; 

7 bin 4— C.slice(i, j); // range [i, j) 

8 S ,add(median(bin)); 

9 i 4— j 


using the difference from the original relay weight dis- 
tribution as a metric. This is calculated as the integral of 
the absolute value of the difference between the sampled 
CDF s(x) and the Tor relay selection-weight CDF f(x): 
Jffi | f{x) — s(x) | dx. The result is then normalized. Fig- 
ure 2c compares the distribution of this closeness metric 
for 1000 samples of K, relays using our algorithm and 
random sampling. (We found insignificant variance in 
the sample distributions when choosing 1C € [50, 1000].) 
While our algorithm always produces the best fit result 
for each sample (hence the vertical line in Figure 2c), 
random sampling produces weight distributions as far as 
ten percent from optimal. 

We draw two samples of relays from those listed in 
the consensus: one for exit relays (discussed below) and 
one for non-exit relays. We then consider several relay 
properties. First, we assign each relay to the network ver- 
tex in our topology corresponding to its geographic loca- 
tion (found by GeoIP lookup of the relay’s IP address). 
This allows communication between relays while also 
resulting in latencies between relays that correlate with 
physical distances. Next we compute relay rate limits 7 

7 A relay operator may limit the amount of bandwidth its relay con- 


Algorithm 2: Estimate relay upstream and downstream 
capacities using data published in the consensus, server de- 
scriptors, and extra infos 

Input: consensus weights C, max bw bursts B, max 



read and write bw histories TZ and VV 

Output: capacity up U and down V 
t for i <— 0 to get RelayCount () — 1 do 

2 

if £>[/'] > 0 then 

3 


if TZ[i] > 0 and W[i] > 0 then 

4 



ratio 

5 



if ratio > 1 then 

6 




U[i] <- B\i ]; 

7 




V[i] < — (£>[/] • ratio); 

8 



else 

9 




V[i] <- B[i ]; 

10 




L "1*1 sis); 

11 


else U[i] <- V[i] <- B\i ] ; 

12 

e 

Ise if 7 Z[i] > 0 and W[i] > 0 then 

13 


U[i] <- W[i\; 

14 


v[i]^n[i]- 

15 

else U[i] 








and access link capacities, the most important properties 
affecting the resources each relay provides and the ex- 
pected client performance in our modeled network. Rate 
limits are taken from the public server descriptors [12] of 
our sampled relays. Capacities must be estimated. 

Since a relay’s ISP access link capacities are not di- 
rectly measured or published, we estimate these val- 
ues using historical bandwidth measurements published 
in server descriptors and extra info documents, and the 
weights published in the consensus. The published doc- 
uments include: bandwidth weights — values used dur- 
ing circuit construction to help distribute client load to 
faster relays; observed bandwidth — the smaller of the 
maximum sustained input and output over any ten second 
interval; and read/write bandwidth histories — the maxi- 
mum sustained input and output over any fifteen-minute 
interval. We prefer the observed bandwidth as the best 
estimate of capacity. Since only the smaller of the input 
and output observed bandwidth is published, we use the 
read/write histories to infer to which the published value 
corresponds, and the ratio of read/write histories to esti- 
mate the unpublished observed value. In the absence of 
observed bandwidth information, we use read/write his- 
tories directly, and otherwise fall back on the bandwidth 
weights. A detailed specification is provided in Algo- 


sumes by configuring a token bucket rate-limiter: the token bucket size 
and refill rate can be configured by setting BandwidthBurst and 
BandwidthRate in the configuration file. 


rithm 2. The distribution on relay bandwidths computed 
using Algorithm 2 is shown in Figure 2a. 

Note that a relay’s observed bandwidth is only a good 
estimator of capacity when the relay was not limiting its 
rate during at least one ten second interval, and the relay 
had enough clients to consume its available bandwidth. 
Otherwise, the observed bandwidth is an underestimate 
of a relay’s true capacity. This is corroborated in Fig- 
ure 2a: upstream and downstream estimates are mostly 
symmetric due to the reliance on observed Tor bandwidth 
and Tor’s circuit design, and the relay capacities appear 
far less than the expected upstream and downstream ca- 
pacities from Net Index. We plan to explore passive 
measurement techniques, such as packet trains [22], to 
directly measure relay capacities in future work. Such 
measurements would provide a significantly better data 
source for modeling capacities than currently available. 

The last part of modeling relays is adjusting their Tor 
configuration. As mentioned above, we sample relays 
that will exit Tor traffic separate from those that won’t. 
Both exit and non-exit relays require the ORPort op- 
tion to configure it as a relay while exit relays addi- 
tionally require a configured ExitPolicy. (Exit poli- 
cies may be found in relays’ server descriptors.) Other 
notable configurations include TestingTorNetwork 
to help with bootstrapping in our test environment, and 
Dir Server to specify our custom directory authorities. 
Tor Authorities. Tor directory authorities are responsi- 
ble for creating, signing, and distributing the consensus 
document — a list of all available relays and their associ- 
ated bandwidth weights. Tor bandwidth authorities mea- 
sure the expected performance of each relay and use the 
relative measured performance to compute the consensus 
weights used by clients for relay selection. In the live 
Tor network, the bandwidth measurement functionality 
is provided by a set of scripts known as TorFlow [13], 

Our model selects the fastest sampled non-exit relay as 
the directory authority (all Tor directory authorities are 
currently non-exit relays). Since our test network lacks 
TorFlow, we must ensure that the bandwidth weights that 
appeared in the live network consensus also appear in our 
test network consensus. This is done by writing a . v3bw 
file with the live network bandwidth weights in the di- 
rectory authority’s data directory, as is done in live Tor. 
Lacking a valid . v3bw bandwidth file, the authorities 
will fall back on relays’ reported observed bandwidth. In 
this case, we must remove a software-defined limit 8 on 
the observed bandwidth to allow relays to report the cor- 
rect consensus weight. Note that although clients will 
be selecting relays in our test network using the same 
weights as the live network, the probability that each re- 

8 Directory authorities will not trust any self-reported relay band- 
width over DEFAULT J4AX_BELIEVABLE_BANDWIDTH, which is set 
to a default value of lOMiB/s. 


Table 1 : The ten countries with the highest reported Tor con- 
necting user counts [12] during January, 2012. 


Country 

% 

Country 

% 

United States 

16.46 

Spain 

5.08 

Iran 

12.63 

Russia 

3.46 

Germany 

9.99 

Republic of Korea 

2.66 

Italy 

6.96 

United Kingdom 

2.39 

France 

6.30 

Saudi Arabia 

2.38 


lay is selected necessarily increases (we downsampled 
the relays and the sum of the probabilities must equal 1). 
Tor Clients. In our model. Tor clients are the main 
source of network load, producing all of the exit-bound 
traffic routed through Tor while simultaneously serving 
to measure network performance. Clients perform syn- 
chronous HTTP GET requests to download files through 
our modeled Tor network. Clients choose HTTP servers 
from which to request each download uniformly at ran- 
dom. Since the requests are synchronous, each client will 
be responsible for at most one stream through Tor at any 
time. Each client measures the time from when it ini- 
tiates a connection to the SOCKS application proxy to 
the first byte and last byte of the file payload, indicating 
network responsiveness and performance. 

Our model classifies clients into two broad categories: 
web clients and bulk clients. Each web client requests 
320 KiB files, the average webpage size according to re- 
cent web metrics [30], After completing a download, a 
web client will pause for a time drawn uniformly at ran- 
dom from a range of [1,20] seconds before initiating the 
next download to simulate the time a user takes to con- 
sume the web page content. Each bulk clients requests 
5 MiB files without pausing between the completion of 
one download and the initiation of the next. Our client 
model is based on work characterizing Tor exit traffic by 
McCoy et al. [27]. This work found that roughly 60% of 
the bytes and 95% of the connections exiting Tor were at- 
tributable to HTTP traffic while roughly 40% of the bytes 
and 5% of the connections were attributable to BitTor- 
rent traffic. Therefore, we use a 19: 1 ratio of web to bulk 
clients. The total number of clients is dependent on the 
number of relays and their capacities (see Section 4). 

Each client is assigned a geographical location and the 
corresponding network vertex in our topology according 
to Tor’s directly connecting user statistics [12,21]. These 
statistics specify the country from which clients connect 
when directly downloading Tor directory information. 
The top ten countries from a recent version of this data 
are shown in Table 1 . When assigning a client to a ver- 
tex, the assignment is weighted by the given percentages. 
Each client’s ISP connection upstream and downstream 
capacities are taken from the default vertex properties as 
measured by Net Index [9] (see Section 3.1). 

Internet Servers. In our model, HTTP servers are the 
destinations of our client requests and the sources of 


Table 2: The ten countries with the highest number of servers 
in the Alexa top 1 million data set [1] during January, 2012. 


Country 

% 

Country 

% 

United States 

47.94 

France 

3.64 

Germany 

8.65 

Russia 

3.40 

China 

4.50 

Netherlands 

2.86 

United Kingdom 

4.20 

Canada 

2.10 

Japan 

3.73 

Italy 

1.48 


the files downloaded through Tor. In order to attribute 
changes in performance to Tor itself while minimizing 
effects external to the network, we assign Internet servers 
lOOMiB/s bandwidth capacities. This high capacity will 
prevent our Internet servers from becoming bottlenecks 
during our client downloads. The geographic locations 
of Internet servers are assigned using the Alexa Top Sites 
data set [1]. Since the Alexa ranking may not capture the 
usage patterns of Tor users well, we instead produce a 
distribution on location of the reported top one million 
sites. 9 The top ten countries with the most sites in the 
Alexa data set are given in Table 2. Our assignment of 
server to topology vertex is weighted by this distribution, 
similar to our client vertex assignment. 

4 Methodology and Experiments 

To determine the accuracy of and increase the confidence 
in our Tor network model, we instantiate it using two 
state-of-the-art Tor experimentation tools: Shadow [23] 
and ExperimenTor [16] (see Section 2 for background). 
This section compares the performance and load charac- 
teristics of the environments produced with each tool to 
that of the live Tor network, illustrating the effectiveness 
of our modeling strategies from Section 3. We choose 
network performance and load because Tor already mea- 
sures these characteristics on the live network, allowing 
for a direct comparison of results. Further, these metrics 
represent the gauges in which clients and relays are gen- 
erally interested, and are most useful when developing 
new algorithms that improve the state of the network. 

We test our model with two different network sizes, 
both of which are scaled down versions of the live Tor 
network. In our small network, we configure 50 re- 
lays and 500 clients that communicate with 50 HTTP 
file servers. In our large network, we configure 100 re- 
lays and 1000 clients that communicate with 100 HTTP 
file servers. The small and large networks are approxi- 
mately fifty and twenty-five times smaller than the size 
of Tor, respectively. Both Shadow and ExperimenTor 
use instantiated versions of our Tor network model 10 
and are configured to run a vanilla instance of version 
0.2.3.13-alpha of the Tor software for ninety virtual min- 
utes. Download results are ignored during the first thirty 
minutes of each experiment to allow for Tor’s bootstrap- 

9 We find locations with standard DNS queries and GeoIP lookups. 

ul The topology files are available on the Shadow website [11], 




(a) 320 KiB clients (b) 5 MiB clients (c) 320 KiB clients (d) 5 MiB clients 

Figure 3: Performance for live Tor and our small modeled network of 50 relays and 500 clients in Shadow and ExperimenTor. 
Time to the first byte of the data payload is shown in (a) and (b), and time to the last byte in (c) and (d), for various download sizes. 




(a) 320 KiB clients (b) 5 MiB clients (c) 320 KiB clients (d) 5 MiB clients 

Figure 4: Performance for live Tor and our large modeled network of 100 relays and 1000 clients in Shadow and ExperimenTor. 
Time to the first byte of the data payload is shown in (a) and (b), and time to the last byte in (c) and (d), for various download sizes. 


ping process. File download timings during the remain- 
ing period are utilized as discussed below. 

Note that we explored various numbers of clients and 
found that a 10:1 client-to-relay ratio in our experiments 
resulted in load and network performance that reason- 
ably approximated that of the live Tor network [12]. We 
stress that this client-to-relay ratio is due to our client 
modeling strategies; alternative client behaviors may re- 
quire an adjusted ratio to produce the network charac- 
teristics that best approximate Tor. Accurately modeling 
Tor client behaviors is an open research problem which 
future work should consider. 

4.1 Network Performance 

We compare client performance measured in our test en- 
vironments to client performance in Tor during the same 
period we are modeling. 1 1 We measure the time to the 
first and last byte of the data payload of our 320 KiB 
and 5 MiB file downloads as indications of network re- 
sponsiveness and throughput. We compare our results 
to live Tor network performance measured with torperf 
[14], a tool that monitors live Tor network performance 
by downloading files of sizes 50 KiB, 1 MiB, and 5 MiB. 
Performance for our small and large networks are respec- 
tively shown in Figures 3 and 4. 

We expect client performance in our test environments 
to be similar to that in Tor. In particular, the time-to-first- 
byte should be consistent regardless of the size of the file 
being downloaded. As can be seen in Figures 3a, 3b, 

1 'This work models Tor as it existed during January. 2012. 


4a, and 4b, our model produces accurate time-to-first- 
byte performance in both tools, although the tools tend to 
lose some accuracy above the eightieth percentile. Under 
the time-to-last-byte metric, we expect our 320 KiB web 
downloads to complete somewhere between the torperf 
50 KiB and IMiB downloads, and our 5 MiB download 
times to be consistent with torperf. Web download times 
are more accurate in ExperimenTor in the large network 
(Figure 4c) than the small (Figure 3c), and all downloads 
tend to take slightly longer in ExperimenTor than in live 
Tor. Shadow approximates web download times reason- 
ably well (Figures 3c and 4c), and bulk downloads com- 
plete slightly faster in Shadow than in Tor (Figures 3d 
and 4d). Overall, we are impressed that our model en- 
ables both tools to characterize Tor performance closely, 
even with scaled-down Tor networks. 

4.2 Network Load 

Each relay in Tor tracks byte histories: the number of 
bytes read and written over time. We use these statistics 
to calculate the throughput of each relay included in our 
small and large networks, and directly compare through- 
puts from Tor with throughputs from our experimenta- 
tion environments. The results are shown in Figure 5. 

The aggregate throughput for all the relays we chose 
in our small network (Figure 5a) totaled 27.6MiB/s for 
live Tor, 31.1 MiB/s in Shadow, and 33.1 MiB/s in Ex- 
perimenTor. In our large network (Figure 5b), the aggre- 
gate throughput was 44.8 MiB/s in live Tor, 58.4 MiB/s in 
Shadow, and 62.2 MiB/s in ExperimenTor. These results 
indicate that our experimental networks were too heav- 
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Figure 5: Load in live Tor and (a) our small modeled network of 50 relays and 500 clients, and (b) our large modeled network of 
100 relays and 1000 clients. Throughput is indexed by each relay chosen in our model and the sum is shown in the legend, (c) The 
distribution on the normalized experimental throughput error from reported live Tor throughput. 


ily loaded, and the absolute error increased with the net- 
work size. The distribution on the normalized individual 
relay throughput error is shown in Figure 5c. The distri- 
butions have long tails: the maximum normalized error 
was 34.9% for Shadow and 28.6% for ExperimenTor in 
the small network, and 23.9% for Shadow and 22.5% for 
ExperimenTor in the large network. Although the abso- 
lute error increased with the network size, the individ- 
ual errors decreased in the larger network. Our anal- 
ysis found that most throughput error was attributable 
to bootstrapping issues: recently added fast relays were 
under-utilized in Tor but fully utilized in our experi- 
ments. Despite these issues, over 95% of relays in the 
small network and 98% of relays in the large network 
had less than 10% throughput error. 

5 Lessons Learned 

Modeling a distributed system is a complex process. 
During this process, we found that it is important to use 
real Internet and system measurements to eliminate ar- 
bitrary modeling decisions, as this tends to have a sig- 
nificant impact on how accurately the experimental envi- 
ronment replicates the real distributed system. However, 
measurements should not be used until they are fully un- 
derstood (what they mean and how they are useful), or 
they may harm accuracy. 

We also found it important to determine useful metrics 
that allow for a comparison between the experimental 
platform and the real distributed system being modeled. 
Useful metrics and proper comparisons of measurements 
increase confidence in the obtained results. Useful met- 
rics assist in understanding the strengths and weaknesses 
of a model, and help determine if the environment pro- 
duced from the model is suitable for the given research. 

We discovered that it’s very useful to replicate experi- 
ments on multiple experimental platforms. This can help 
identify errors or peculiarities caused by a specific tool. 
For example, this process allowed us to discover that 
packet header overhead on TCP packets without a data 


payload were not consuming bandwidth on Shadow’s 
virtual network interfaces. Shadow’s accuracy improved 
greatly after accounting for TCP packet header overhead 
on both data and control packets. 

Finally, it is important to understand that Experi- 
menTor and Shadow have fundamentally different ap- 
proaches to experimentation: Shadow simulates all net- 
work properties including jitter and packet loss on links 
due to the presence of background Internet traffic. In 
contrast, ExperimenTor emulates link properties simply 
as a function of the Tor traffic load, ignoring any ef- 
fects due to background Internet traffic. As in our exper- 
iments, differences between other tools may also con- 
tribute to the differences in experimental performance 
and load, and should be considered when analyzing com- 
parative experiments. 

6 Conclusion 

This paper explored modeling the distributed Tor net- 
work. We provided precise and detailed specifications 
of our modeling choices and their effect on the result- 
ing experimental environment. We validated our model 
by instantiating it in two state-of-the-art Tor experimen- 
tation tools: Shadow [23] and ExperimenTor [16]. We 
compared network performance and network load from 
our experiments to real Tor data and found that our model 
leads to environments that characterize the live Tor net- 
work well. Finally, we provided insights into the lessons 
we learned while replicating our experiments with both 
experimentation tools. 

Future Work. There are several ways in which our 
model could be improved. First, we could increase the 
size of the our network by improving the software sup- 
port and acquiring the hardware resources necessary for 
handling larger networks in the available experimenta- 
tion tools. Running at or near scale means we may reduce 
experimentation artifacts, such as those created because 
relay selection probabilities necessarily change when us- 
ing only a subset of the existing relays. Larger networks 


will also provide a more realistic experimentation envi- 
ronment and more realistic results. 

Second, our model may benefit from capacity and link 
characteristics gathered directly from Tor relays. This 
would give us precise statistics about the specific nodes 
we are modeling and reduce our reliance on external 
sources of more generic information for links between 
relays. One possible approach to capacity measure- 
ment involves using packet trains [22], but more work 
is needed to determine the efficacy of such techniques in 
the context of the Tor network. At the same time, many 
research questions may require a more detailed topology 
structure than that modeled in this paper. Higher fidelity 
of the underlying network topology may be possible by 
combining data from iPlane [7] and CAIDA [3], 

Third, determining a better client model would further 
increase confidence in experimental results. Producing a 
more robust client model will likely require the develop- 
ment of algorithms for collecting client statistics in a way 
that mitigates privacy risks. While this is challenging 
since client behaviors are dynamic and hard to capture 
in a representative fashion, it would allow us to increase 
faithfulness to the live Tor network and its users. Finally, 
modeling malicious adversaries and their behaviors may 
be of specific interest to future research that analyzes the 
security of Tor or its algorithms. 
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